AITopics | incoming weight

f0552f14388d95b19740dee809f5cad1-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:25:49 GMT

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

A Missing lemmas for the proof of Theorem 3.1

Neural Information Processing SystemsFeb-17-2026, 21:41:52 GMT

The following proof is from Daniely and V ardi [15], and we give it here for completeness. By Lemma A.1, there exists a DNF formula We construct such an affine layer in Lemma A.2. At least one of the k size-n slices in z contains 0 more than once. We define the outputs of our affine layer as follows. Pr [z represents a hyperedge ] = n (n 1) ... (n k + 1) null 1 n null Pr null z Z null 1 2 log(n) .

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

258be18e31c8188555c2ff05b4d542c3-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 20:34:43 GMT

neural network, node, unimportant node, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-Task Zipping via Layer-wise Neuron Sharing

Xiaoxi He, Zimu Zhou, Lothar Thiele

Neural Information Processing SystemsNov-20-2025, 19:14:03 GMT

MTZ is also able to effectively merge multiple residual networks.

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Continual Learning with Node-Importance based Adaptive Group Sparse Regularization Sangwon Jung

Neural Information Processing SystemsOct-2-2025, 12:07:29 GMT

Our experimental contributions are multifold.

artificial intelligence, machine learning, node, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Recurrent Cascade-Correlation Architecture

Neural Information Processing SystemsApr-6-2023, 19:43:40 GMT

Recurrent Cascade-Correlation CRCC) is a recurrent version of the Cascade(cid:173) Correlation learning architecture of Fah I man and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network as needed during training. In effect, the network builds up a finite-state machine tailored specifically for the current problem. RCC retains the advantages of Cascade-Correlation: fast learning, good generalization, automatic construction of a near-minimal multi-layered network, and incremental training.

fahlman, incoming weight, recurrent cascade-correlation architecture, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

Nguyen, Quynh, Brechet, Pierre, Mondelli, Marco

arXiv.org Machine LearningFeb-18-2021

It has been empirically observed that, in deep neural networks, the solutions found by stochastic gradient descent from different random initializations can be often connected by a path with low loss. Recent works have shed light on this intriguing phenomenon by assuming either the over-parameterization of the network or the dropout stability of the solutions. In this paper, we reconcile these two views and present a novel condition for ensuring the connectivity of two arbitrary points in parameter space. This condition is provably milder than dropout stability, and it provides a connection between the problem of finding low-loss paths and the memorization capacity of neural nets. This last point brings about a trade-off between the quality of features at each layer and the over-parameterization of the network. As an extreme example of this trade-off, we show that (i) if subsets of features at each layer are linearly separable, then almost no over-parameterization is needed, and (ii) under generic assumptions on the features at each layer, it suffices that the last two hidden layers have $\Omega(\sqrt{N})$ neurons, $N$ being the number of samples. Finally, we provide experimental evidence demonstrating that the presented condition is satisfied in practical settings even when dropout stability does not hold.

international conference, neural network, neuron, (13 more...)

arXiv.org Machine Learning

2102.09671

Country:

Europe > Germany (0.04)
Europe > Austria (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interpreting a Penalty as the Influence of a Bayesian Prior

Wolinski, Pierre, Charpiat, Guillaume, Ollivier, Yann

arXiv.org Machine LearningFeb-1-2020

For instance, penalties are used to improve generalization, prune neurons or reduce the rank of tensors of weights. Therefore, usual penalties are mostly empirical and user-defined, and integrated to the loss as follows: L( w) null( w) r (w), with w the vector of all parameters in the network, null( w) the error term and r (w) the penalty term. From a Bayesian point of view, optimizing such a loss L is equivalent to finding the Maximum A Posteriori (MAP) of the parameters w given the training data and a prior α exp( r). Indeed, assuming that the loss null is a log-likelihood loss, namely, null(w) ln p w( D) with dataset D, then minimizing L is equivalent to minimizing L MAP(w) ln p w(D) ln(α (w)). Thus, within the MAP framework, we can interpret the penalty term r as the influence of a prior α [14]. However, the MAP approximates the Bayesian posterior very roughly, by taking its maximum. Variational Inference (VI) provides a variational posterior distribution rather than a single value, hopefully representing the Bayesian posterior much better. VI looks for the best posterior approximation within a family β u(w) of approximate posteriors over w, parameterized Inria, Team TAU, Gif-sur-Yvette, France † Facebook, France 1 arXiv:2002.00178v1

neural network, neuron, penalty, (17 more...)

arXiv.org Machine Learning

2002.00178

Country:

Europe > France (0.44)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Multi-Task Zipping via Layer-wise Neuron Sharing

He, Xiaoxi, Zhou, Zimu, Thiele, Lothar

Neural Information Processing SystemsDec-31-2018

Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device. Yet the complexity of these neural networks needs to be trimmed down both within-model and cross-model to fit in mobile storage and memory. Previous studies focus on squeezing the redundancy within a single neural network. In this work, we aim to reduce the redundancy across multiple models. We propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. Central in MTZ is a layer-wise neuron sharing and incoming weight updating scheme that induces a minimal change in the error function. MTZ inherits information from each model and demands light retraining to re-boost the accuracy of individual tasks. Evaluations show that MTZ is able to fully merge the hidden layers of two VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet and CelebA, or share 39.61% parameters between the two networks with <0.5% increase in the test errors for both tasks. The number of iterations to retrain the combined network is at least 17.8 times lower than that of training a single VGG-16 network. Moreover, experiments show that MTZ is also able to effectively merge multiple residual networks.

artificial intelligence, machine learning, neuron, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Multi-Task Zipping via Layer-wise Neuron Sharing

He, Xiaoxi, Zhou, Zimu, Thiele, Lothar

Neural Information Processing SystemsDec-31-2018

Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device. Yet the complexity of these neural networks needs to be trimmed down both within-model and cross-model to fit in mobile storage and memory. Previous studies focus on squeezing the redundancy within a single neural network. In this work, we aim to reduce the redundancy across multiple models. We propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. Central in MTZ is a layer-wise neuron sharing and incoming weight updating scheme that induces a minimal change in the error function. MTZ inherits information from each model and demands light retraining to re-boost the accuracy of individual tasks. Evaluations show that MTZ is able to fully merge the hidden layers of two VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet and CelebA, or share 39.61% parameters between the two networks with <0.5% increase in the test errors for both tasks. The number of iterations to retrain the combined network is at least 17.8 times lower than that of training a single VGG-16 network. Moreover, experiments show that MTZ is also able to effectively merge multiple residual networks.

artificial intelligence, machine learning, neuron, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Filters

Collaborating Authors

incoming weight

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

f0552f14388d95b19740dee809f5cad1-Supplemental-Conference.pdf

A Missing lemmas for the proof of Theorem 3.1

258be18e31c8188555c2ff05b4d542c3-Paper.pdf

Multi-Task Zipping via Layer-wise Neuron Sharing

Continual Learning with Node-Importance based Adaptive Group Sparse Regularization Sangwon Jung

The Recurrent Cascade-Correlation Architecture

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

Interpreting a Penalty as the Influence of a Bayesian Prior

Multi-Task Zipping via Layer-wise Neuron Sharing

Multi-Task Zipping via Layer-wise Neuron Sharing